43 research outputs found
Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia
Hyperlinks are an essential feature of the World Wide Web. They are
especially important for online encyclopedias such as Wikipedia: an article can
often only be understood in the context of related articles, and hyperlinks
make it easy to explore this context. But important links are often missing,
and several methods have been proposed to alleviate this problem by learning a
linking model based on the structure of the existing links. Here we propose a
novel approach to identifying missing links in Wikipedia. We build on the fact
that the ultimate purpose of Wikipedia links is to aid navigation. Rather than
merely suggesting new links that are in tune with the structure of existing
links, our method finds missing links that would immediately enhance
Wikipedia's navigability. We leverage data sets of navigation paths collected
through a Wikipedia-based human-computation game in which users must find a
short path from a start to a target article by only clicking links encountered
along the way. We harness human navigational traces to identify a set of
candidates for missing links and then rank these candidates. Experiments show
that our procedure identifies missing links of high quality
Effective and Efficient Similarity Index for Link Prediction of Complex Networks
Predictions of missing links of incomplete networks like protein-protein
interaction networks or very likely but not yet existent links in evolutionary
networks like friendship networks in web society can be considered as a
guideline for further experiments or valuable information for web users. In
this paper, we introduce a local path index to estimate the likelihood of the
existence of a link between two nodes. We propose a network model with
controllable density and noise strength in generating links, as well as collect
data of six real networks. Extensive numerical simulations on both modeled
networks and real networks demonstrated the high effectiveness and efficiency
of the local path index compared with two well-known and widely used indices,
the common neighbors and the Katz index. Indeed, the local path index provides
competitively accurate predictions as the Katz index while requires much less
CPU time and memory space, which is therefore a strong candidate for potential
practical applications in data mining of huge-size networks.Comment: 8 pages, 5 figures, 3 table
Just-for-Me: An Adaptive Personalization System for Location-Aware Social Music Recommendation
The fast growth of online communities and increasing pop-ularity of internet-accessing smart devices have significantly changed the way people consume and share music. As an emerging technology to facilitate effective music retrieval on the move, intelligent recommendation has been recently re-ceived great attentions in recent years. While a large amount of efforts have been invested in the field, the technology is still in its infancy. One of the major reasons for this stagna-tion is due to inability of the existing approaches to compre-hensively take multiple kinds of contextual information into account. In the paper, we present a novel recommender sys-tem called Just-for-Me to facilitate effective social music rec-ommendation by considering users ’ location related contexts as well as global music popularity trends. We also develop an unified recommendation model to integrate the contex-tual factors as well as music contents simultaneously. Fur-thermore, pseudo-observations are proposed to overcome the cold-start and sparsity problems. An extensive experimental study based on different test collections demonstrates that Just-for-Me system can significantly improve the recommen-dation performance at various geo-locations
Empirical analysis of web-based user-object bipartite networks
Understanding the structure and evolution of web-based user-object networks
is a significant task since they play a crucial role in e-commerce nowadays.
This Letter reports the empirical analysis on two large-scale web sites,
audioscrobbler.com and del.icio.us, where users are connected with music groups
and bookmarks, respectively. The degree distributions and degree-degree
correlations for both users and objects are reported. We propose a new index,
named collaborative clustering coefficient, to quantify the clustering behavior
based on the collaborative selection. Accordingly, the clustering properties
and clustering-degree correlations are investigated. We report some novel
phenomena well characterizing the selection mechanism of web users and outline
the relevance of these phenomena to the information recommendation problem.Comment: 6 pages, 7 figures and 1 tabl
Link prediction in complex networks: a local na\"{\i}ve Bayes model
Common-neighbor-based method is simple yet effective to predict missing
links, which assume that two nodes are more likely to be connected if they have
more common neighbors. In such method, each common neighbor of two nodes
contributes equally to the connection likelihood. In this Letter, we argue that
different common neighbors may play different roles and thus lead to different
contributions, and propose a local na\"{\i}ve Bayes model accordingly.
Extensive experiments were carried out on eight real networks. Compared with
the common-neighbor-based methods, the present method can provide more accurate
predictions. Finally, we gave a detailed case study on the US air
transportation network.Comment: 6 pages, 2 figures, 2 table
Predicting Missing Links via Local Information
Missing link prediction of networks is of both theoretical interest and
practical significance in modern science. In this paper, we empirically
investigate a simple framework of link prediction on the basis of node
similarity. We compare nine well-known local similarity measures on six real
networks. The results indicate that the simplest measure, namely common
neighbors, has the best overall performance, and the Adamic-Adar index performs
the second best. A new similarity measure, motivated by the resource allocation
process taking place on networks, is proposed and shown to have higher
prediction accuracy than common neighbors. It is found that many links are
assigned same scores if only the information of the nearest neighbors is used.
We therefore design another new measure exploited information of the next
nearest neighbors, which can remarkably enhance the prediction accuracy.Comment: For International Workshop: "The Physics Approach To Risk:
Agent-Based Models and Networks", http://intern.sg.ethz.ch/cost-p10
Offering collaborative-like recommendations when data is sparse: The case of attraction-weighted information filtering
We propose a low-dimensional weighting scheme to map information filtering recommendations into more relevant, collaborative filtering-like recommendations. Similarly to content-based systems, the closest (most similar) items are recommended, but distances between items are weighted by attraction indexes representing existing customers ’ preferences. Hence, the most preferred items are closer to all the other points in the space, and consequently more likely to be recommended. The approach is especially suitable when data is sparse, since attraction weights need only be computed across items, rather than for all user-item pairs. A first study conducted with consumers within an online bookseller context, indicates that our approach has merits: recommendations made by our attraction-weighted information filtering recommender system significantly outperform pure information filtering recommendations, and favorably compare to data-hungry collaborative filtering systems